有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

weka java代码kmeans群集

我使用数据挖掘技术(kmeans集群)编写了一个用于恶意软件检测的java代码。我使用jnetpcap库嗅探数据包来分析它,然后在方法nextpacket中为第一个数据包编写kmeans聚类算法

该算法工作得很好,因为它从类Instances创建具有特定属性的对象,这些属性将基于它们进行集群,但是在下一个数据包上,该代码不能再次运行,因为它会引发异常

我使用的代码是:

for(int dim = 0; dim < numDimensions; dim++)
{
    Attribute current = new Attribute("Attribute" + dim , dim);

    if(dim == 0)
    {
        for(int obj = 0; obj < numInstances; obj++)
        {
            // instances.add(new SparseInstance(numDimensions));
            instances.add(new DenseInstance(numDimensions) );
        }
    }

    for(int obj = 0; obj < numInstances; obj++)
    {
        instances.get(obj).setValue(current, (Double)data[dim+1][obj]);
    }

    atts.add(current);
}

Instances newDataset = new Instances("Dataset" , atts, instances.size());       //this is the line that throws the exception

for(Instance inst : instances)
    newDataset.add(inst);
SimpleKMeans kMeans = new SimpleKMeans();
kMeans.setNumClusters(2);
// kMeans.setMaxIterations(4);
kMeans.buildClusterer(newDataset);
//   int clusterNumbers;
// clusterNumbers=kMeans.numberOfClusters();
for (int j=0;j<numInstances;j++)
{ 
    int classif=kMeans.clusterInstance(newDataset.get(j));
    //  double []distr=kMeans.distributionForInstance(newDataset.firstInstance());
    System.out.println(classif);
    //   System.out.println(distr[0]);
    //  System.out.println(distr[1]);
    ArrayList<Double> temp5=flowFeatures.get((JFlowKey)data[0][j]);
    if (classif==0)
    {
        // instances0.add(newDataset.get(j));
         instance0FlowFeatures.put((JFlowKey)data[0][j], temp5);
    }
    else if(classif==1)
    {
        //instances1.add(newDataset.get(j));
        instance1FlowFeatures.put((JFlowKey)data[0][j], temp5);
    }
}

我看到的例外是:

java.lang.IllegalArgumentException: Attribute names are not unique! Causes: 'Attribute0' 'Attribute1' 'Attribute2' 'Attribute3' 'Attribute4' 'Attribute5' 'Attribute6' 'Attribute7' 'Attribute0' 'Attribute1' 'Attribute2' 'Attribute3' 'Attribute4' 'Attribute5' 'Attribute6' 'Attribute7'

谁能帮帮我吗


共 (1) 个答案

  1. # 1 楼答案

    考虑像表的列这样的属性,必须创建一次

    这是一维数据的代码。 在这个例子中,我想象的表有一列是“attr1”,表中有 3条记录(实例),保持结构简单易懂

            Attribute attr1 = new Attribute("attr1");                               
            ArrayList<Attribute> attrList = new ArrayList<Attribute>();             
            attrList.add(attr1);                
    
            Instances dataset = new Instances("test", attrList, 0);
    
            double[] val1 = new double[] { 1.2};
            double[] val2 = new double[] { 2.2};
            double[] val3 = new double[] { 1.4};
    
            Instance instance0 = new DenseInstance(1.0, val1);
            instance0.setDataset(dataset);
    
            Instance instance1 = new DenseInstance(1.0, val2);
            instance1.setDataset(dataset);
    
            Instance instance2 = new DenseInstance(1.0, val3);
            instance2.setDataset(dataset);
    
            dataset.add(instance0);     
            dataset.add(instance1); 
            dataset.add(instance2); 
    
            SimpleKMeans kmeans = new SimpleKMeans();               
            try {
                kmeans.setPreserveInstancesOrder(true);
                kmeans.setNumClusters(2);
                kmeans.setSeed(2);
                kmeans.setDontReplaceMissingValues(true);
                kmeans.buildClusterer(dataset);
                kmeans.setMaxIterations(10);                                    
                Instances instances = kmeans.getClusterCentroids();
                int assignments[] = kmeans.getAssignments();
                int x=0;
                for(int assignment : assignments) {
                    System.out.println("data :" + dataset.get(x) + "instance idx: " + x + " centroid value: " + instances.get(assignment));
                    x++;
                }
             }